Second-tier Cache Management to Support DBMS Workloads

نویسنده

  • Xuhui Li
چکیده

Enterprise Database Management Systems (DBMS) often run on computers with dedicated storage systems. Their data access requests need to go through two tiers of cache, i.e., a database bufferpool and a storage server cache, before reaching the storage media, e.g., disk platters. A tremendous amount of work has been done to improve the performance of the first-tier cache, i.e., the database bufferpool. However, the amount of work focusing on second-tier cache management to support DBMS workloads is comparably small. In this thesis we propose several novel techniques for managing second-tier caches to boost DBMS performance in terms of query throughput and query response time. The main purpose of second-tier cache management is to reduce the I/O latency endured by database query executions. This goal can be achieved by minimizing the number of reads and writes issued from second-tier caches to storage devices. The first part of our research focuses on reducing the number of read I/Os issued by second-tier caches. We observe that DBMSs issue I/O requests for various reasons. The rationales behind these I/O requests provide useful information to second-tier caches because they can be used to estimate the temporal locality of the data blocks being requested. A second-tier cache can exploit this information when making replacement decisions. In this thesis we propose a technique to pass this information from DBMSs to second-tier caches and to use it in guiding cache replacements. The second part of this thesis focuses on reducing the number of writes issued by second-tier caches. Our work is two fold. First, we observe that although there are second-tier caches within computer systems, today’s DBMS cannot take full advantage of them. For example, most commercial DBMSs use forced writes to propagate bufferpool updates to permanent storage for data durability reasons. We notice that enforcing such a practice is more conservative than necessary. Some of the writes can be issued as unforced requests and can be cached in the second-tier cache without immediate synchronization. This will give the second-tier cache opportunities to cache and consolidate multiple writes into one request. However, unfortunately, the current POSIX compliant file system interfaces provided by mainstream operating systems (e.g., Unix and Windows) are not flexible enough to support such dynamic synchronization. We propose to extend such interfaces to let DBMSs take advantage of using unforced writes whenever possible. Additionally, we observe that the existing cache replacement algorithms are designed solely to maximize read cache hits (i.e., to minimize read I/Os). The purpose is to minimize the read latency, which is on the critical path of query executions. We argue that minimizing read requests is not the only objective of cache replacement. When I/O bandwidth becomes a bottleneck the objective should be to minimize the total number of I/Os, including both reads and writes, to achieve the best performance. We propose to associate a new type of replacement cost, i.e., the total number of I/Os caused by the replacement, with each cache page; and we also present a partial characterization of an optimal algorithm which minimizes

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Caching for flash-based databases and flash-based caching for databases

Database storage systems today are primarily based on two technologies: HDD (hard disk drive) and DRAM (dynamic random-access memory). It is increasingly difficult for these systems to deliver acceptable performance, due to fast expanding data volume, growing energy concern, and cost constraints. The emergence of flash memory has made cost-effective solutions possible. However, conventional sto...

متن کامل

Integrating SSD Caching into Database Systems

Flash-based solid state storage devices (SSDs) are now becoming commonplace in server environments. In this paper, we consider the use of SSDs as a persistent second-tier cache for database systems. We argue that it is desirable to change the behavior of the database system’s buffer cache when a second-tier SSD cache is used, so that the buffer cache is aware of which pages are in the SSD cache...

متن کامل

On the Performance of Fetch Engines Running DSS Workloads

This paper examines the behavior of current and next generation microprocessors’ fetch engines while running Decision Support Systems (DSS) workloads. We analyze the effect of the latency of instructions being fetched, their quality and the number of instructions that the fetch engine provides per access. Our study reveals that a well dimensioned fetch engine is of great importance for DSS perf...

متن کامل

Appendix: Collection of Base Models

Analytical models are mathematical models built after a thorough analysis of the underlying system. The models work as follows: they use information about the workload and the underlying system to predict the performance for different configurations. For example, an analytical model of the cache requires a trace of cache accesses, the cache replacement policy, and the cost of cache hit and miss...

متن کامل

Performance Improvement In DBMS

The type of the workload on a database management system (DBMS) is a key consideration in tuning its performance. Allocations for resources such as main memory can be very different depending on whether the workload type is Online Transaction Processing (OLTP) or Decision Support System (DSS). Database administrators must, therefore, recognize the significant shifts of workload type that demand...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011